Delayed reward-based genetic algorithms for partially observable Markov decision problems
نویسندگان
چکیده
Reinforcement learning often involves assuming Markov characteristics. However, the agent cannot always observe the environment completely, and in such cases, different states are observed as the same state. In this research, the authors develop a Delayed Reward-based Genetic Algorithm for POMDP (DRGA) as a means to solve a partially observable Markov decision problem (POMDP) which has such perceptual aliasing problems. The DRGA breaks down the POMDP into several subtasks, and then solves the POMDP by breaking down the agent into several subagents. Each subagent acquires policies adapted to the environment based on the delayed rewards from the environment, and these policies are evolved using a genetic algorithm based on the delayed rewards. The agent adapts to the environment by combining effective policies that remain after natural selection. The authors apply this method to maze search problems in which perception is limited in order to demonstrate its validity. © 2004 Wiley Periodicals, Inc. Syst Comp Jpn, 35(2): 66–78, 2004; Published online in Wiley InterScience (www.interscience. wiley.com). DOI 10.1002/scj.10230
منابع مشابه
A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملOn the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems
We investigate the computability of problems in probabilistic planning and partially observable infinite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic finite automata is adapted to show that the following problem of plan existence in probabilistic planning is undecidable: given a probabilistic planning problem, determine whether there ex...
متن کاملOn the Undecidability of Probabilistic Planning and Innnite-horizon Partially Observable Markov Decision Problems
We investigate the computability of problems in probabilistic planning and partially observable innnite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic nite automata is adapted to show that the following problem of plan existence in probabilistic planning is undecidable: given a probabilistic planning problem, determine whether there exist...
متن کاملOn the Undecidability of Probabilistic Planning and In nite-Horizon Partially Observable Markov Decision Problems
We investigate the computability of problems in probabilistic planning and partially observable innnite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic nite automata is adapted to show that the following problem of plan existence in probabilistic planning is undecidable: given a probabilistic planning problem, determine whether there exist...
متن کاملA POMDP Extension with Belief-dependent Rewards
Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward func...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systems and Computers in Japan
دوره 35 شماره
صفحات -
تاریخ انتشار 2004